{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deferred Initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There’s no way Pytorch(or any other framework for that matter) could predict what the input dimensionality of a network would be. Later on, when working with convolutional networks and images this problem will become even more pertinent, since the input dimensionality (i.e. the resolution ofan image) will affect the dimensionality of subsequent layers at a long range. Hence, the ability to set parameters without the need to know at the time of writing the code what the dimensionality is can greatly simplify statistical modeling. In what follows, we will discuss how this works using initialization as an example. After all, we cannot initialize variables that we don’t know exist."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiating a Network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def getnet(in_features,out_features):\n",
    "    net=nn.Sequential(\n",
    "    nn.Linear(in_features,256),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(256,out_features))\n",
    "    \n",
    "    return net\n",
    "\n",
    "net=getnet(20,10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In pytorch it is not possible to define a layer without mentioning the in_features for that layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.weight torch.Size([256, 20])\n",
      "0.bias torch.Size([256])\n",
      "2.weight torch.Size([10, 256])\n",
      "2.bias torch.Size([10])\n"
     ]
    }
   ],
   "source": [
    "for name,param in net.named_parameters():\n",
    "    print(name,param.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now it is possible to make a network learn the size from a input data by making a custom nn module using pytorch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Deferred Initialization in Practice\n",
    "Now that we know how it works in theory, let’s see when the initialization is actually triggered. In order to do so, we mock\n",
    "up an initializer.The initializer **init_weights** when evoked it initializes the weight of the network.It also sets the weights of the neural network to a non zero value which helps as neural networks tend to get stuck in local minima, so it's a good idea to give them many different starting values. You can't do that if they all start at zero.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Init Linear(in_features=20, out_features=256, bias=True)\n",
      "Init ReLU()\n",
      "Init Linear(in_features=256, out_features=10, bias=True)\n",
      "Init Sequential(\n",
      "  (0): Linear(in_features=20, out_features=256, bias=True)\n",
      "  (1): ReLU()\n",
      "  (2): Linear(in_features=256, out_features=10, bias=True)\n",
      ")\n",
      "Parameter containing:\n",
      "tensor([[-0.1195, -0.2153, -0.0631,  ..., -0.1096,  0.0227,  0.1643],\n",
      "        [-0.0488,  0.0115,  0.2134,  ...,  0.0454, -0.0763,  0.0681],\n",
      "        [ 0.1453,  0.0349, -0.2226,  ...,  0.1600,  0.0137, -0.0916],\n",
      "        ...,\n",
      "        [-0.1276, -0.1720,  0.0265,  ..., -0.0643,  0.0372, -0.0955],\n",
      "        [-0.1196,  0.1300, -0.0260,  ..., -0.1641, -0.1030, -0.1028],\n",
      "        [-0.1228, -0.0825,  0.0319,  ...,  0.1481, -0.1722, -0.1507]],\n",
      "       requires_grad=True)\n",
      "Parameter containing:\n",
      "tensor([[ 0.0606, -0.0366,  0.0618,  ...,  0.0569, -0.0259,  0.0583],\n",
      "        [ 0.0307,  0.0358, -0.0488,  ..., -0.0109,  0.0399, -0.0073],\n",
      "        [-0.0205, -0.0593, -0.0304,  ...,  0.0432, -0.0613, -0.0263],\n",
      "        ...,\n",
      "        [-0.0024, -0.0565, -0.0290,  ..., -0.0335,  0.0229, -0.0048],\n",
      "        [-0.0218, -0.0559, -0.0385,  ...,  0.0155, -0.0362, -0.0012],\n",
      "        [ 0.0383, -0.0266, -0.0451,  ..., -0.0576, -0.0387,  0.0158]],\n",
      "       requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "def init_weights(m):\n",
    "    print(\"Init\",m)\n",
    "    \n",
    "net.apply(init_weights)\n",
    "\n",
    "print(net[0].weight)\n",
    "print(net[2].weight)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.weight torch.Size([256, 20])\n",
      "0.bias torch.Size([256])\n",
      "2.weight torch.Size([10, 256])\n",
      "2.bias torch.Size([10])\n"
     ]
    }
   ],
   "source": [
    "x=torch.rand((2,20))\n",
    "y=net(x) # Forward computation\n",
    "for name,param in net.named_parameters():\n",
    "    print(name,param.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main difference to before is that as soon as we knew the input dimensionality, x is R 20 it was possible to define the\n",
    "weight matrix for the first layer, i.e. W1 is R 256 * 20. With that out of the way, we can progress to the second layer, define\n",
    "its dimensionality to be 10 * 256 and so on through the computational graph and bind all the dimensions as they become\n",
    "available. Once this is known, we can proceed by initializing parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As mentioned at the beginning of this section, deferred initialization can also cause confusion. Before the first forward\n",
    "calculation, we were unable to directly manipulate the model parameters, for example, we could not use the data and\n",
    "set_data functions to get and modify the parameters. Therefore, we often force initialization by sending a sample\n",
    "observation through the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forced Initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Deferred initialization does not occur if the system knows the shape of all parameters when calling the initialize\n",
    "function. This can occur in two cases:\n",
    "1. We’ve already seen some data and we just want to reset the parameters.\n",
    "2. We specified all input and output dimensions of the network when defining it.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first case works just fine, as illustrated below.\n",
    "\n",
    "once we see some data and define the parameters and after that we initialize those parameters again using **init_weights** function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sequential(\n",
      "  (Linear1): Linear(in_features=20, out_features=256, bias=True)\n",
      "  (Linear2): Linear(in_features=256, out_features=10, bias=True)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "net1=nn.Sequential()\n",
    "net1.add_module(\"Linear1\",nn.Linear(20,256))\n",
    "net1.add_module(\"Linear2\",nn.Linear(256,10))\n",
    "print(net1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def init_weights_forced(m):\n",
    "    if type(m) == nn.Linear:\n",
    "        torch.nn.init.xavier_uniform_(m.weight)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In second case we specify the remaining set of parameters when initializing the network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Linear1.weight torch.Size([256, 20])\n",
      "Linear1.bias torch.Size([256])\n",
      "Linear2.weight torch.Size([10, 256])\n",
      "Linear2.bias torch.Size([10])\n"
     ]
    }
   ],
   "source": [
    "net1.apply(init_weights_forced)\n",
    "for name,param in net1.named_parameters():\n",
    "    print(name,param.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above case we have the data before hand and now we use that datato allow the model itself the way it wants to set the  in_features and the parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second case requires us to specify the remaining set of parameters when creating the layer. For instance, for dense\n",
    "layers we also need to specify the in_features so that initialization can occur immediately once  called."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.  Pytorch doesnot provide a inbuilt feature for deferred initialization.\n",
    "2.  Deferred initialization is a good thing. It allows  to set many things automagically and it removes a great\n",
    "    source of errors from defining novel network architectures.\n",
    "3.  We can override this by specifying all implicitly defined variables.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. What happens if you specify only parts of the input dimensions. Do you still get immediate initialization?\n",
    "2. What happens if you specify mismatching dimensions?\n",
    "3. What would you need to do if you have input of varying dimensionality? Hint - look at parameter tying."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
